add Lsn type #64

ericseppanen · 2021-04-23T21:58:51Z

This type is a zero-cost wrapper for a u64, meant to help code communicate with precision what that value means, and prevent any type mix-ups.

It implements Display and Debug. Display will format as 1234ABCD:5678CDEF while Debug will format as Lsn(1234567890).

This isn't ready to merge; it's for code review purposes only.
There are lots of FIXME comments that want feedback.

Fixes #49.

knizhnik

Well, I can not agree that it is "zero-cost wrapper".
Please look at this simple program:

use std::time::Instant;
use std::ops::{Add};

#[derive(Debug, Clone, Copy, Eq, Ord, PartialEq, PartialOrd)]
pub struct Lsn(pub u64);

impl From<u64> for Lsn {
    fn from(n: u64) -> Self {
        Lsn(n)
    }
}

impl Add<u64> for Lsn {
    type Output = Lsn;

    fn add(self, other: u64) -> Self::Output {
        // panic if the addition overflows.
        Lsn(self.0.checked_add(other).unwrap())
    }
}
fn inc_lsn_struct(lsn: Lsn) -> Lsn {
	lsn + 1
}

fn inc_lsn(lsn: u64) -> u64 {
	lsn + 1
}

pub fn main() {
	const ITERATIONS: u64 = 1000000000;
	{
		let now = Instant::now();
		let mut lsn = Lsn(0);
		for _ in 1..ITERATIONS {
			lsn = inc_lsn_struct(lsn);
		}
		println!("Elapsed for LSN {:?} struct {:?}", lsn, now.elapsed());
	}
	{
		let now = Instant::now();
		let mut lsn : u64 = 0;
		for _ in 1..ITERATIONS {
			lsn = inc_lsn(lsn);
		}
		println!("Elapsed for LSN {:?} struct {:?}", lsn, now.elapsed());
	}
}

Results at my notebooks are the following:

knizhnik@xps:~$ rustc -O call.rs
knizhnik@xps:~$ ./call 
Elapsed for LSN Lsn(999999999) struct 27.057152ms
Elapsed for LSN 999999999 struct 39ns
knizhnik@xps:~$ rustc call.rs
knizhnik@xps:~$ ./call 
Elapsed for LSN Lsn(999999999) struct 45.84106434s
Elapsed for LSN 999999999 struct 30.92782111s

It is about 1.5 time differences for non-optimized code and 1000 times for optimized.
I am not sure that optimizer doesn't eliminate calls at all in last case. And speed is still very high. I do not think that we will notice any difference if we use this Lsn representation.

But still it is not possible to say that cost is zero:)

hlinnaka

Overall, looks good to me.

hlinnaka · 2021-04-24T08:45:02Z

zenith_utils/src/lsn.rs

+impl fmt::Display for Lsn {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        // FIXME: should this be "{:X}/{:08X}" ?
+        write!(f, "{:X}/{:X}", self.0 >> 32, self.0 & 0xffffffff)


The snprintf pattern used in postgres is "%X/%X", so "{:X}/{:X}" is in line with that. That said, Postgres input functions do accept the extra 0's. And the extra zeros would be helpful when you're eyeballing different LSN values trying to see which is larger, like 0/10000000 vs 0/1200000.

I'll defer to the people with more postgres knowledge. Which one would you like?

zenith_utils/src/lsn.rs

hlinnaka · 2021-04-24T09:03:11Z

pageserver/src/page_cache.rs

 }

 impl CacheKey {
    pub fn pack(&self, buf: &mut BytesMut) {
        self.tag.pack(buf);
-        buf.put_u64(self.lsn);
+        buf.put_u64(self.lsn.0);


Why is lsn.0 needed, why not just lsn? I thought the impl From<Lsn> for u64 trait would make that work.

The function takes only a u64:

fn put_u64(&mut self, n: u64);

There are a few ways to get automatic conversions, but none of them happen to be used here:

Rust always tries to do "autoref" of the self argument. This is what allows you to type, e.g. myvec.len() instead of (&my_vec).len().

Some functions have generic parameters that can handle the coercion, e.g. one could write:

fn put_u64<T>(&mut self, n: T) where T: Into<u64> { let n: u64 = n.into(); // do stuff with n }

You can do a similar generic type for references using AsRef or AsMut. You'll see this in the standard library a lot, e.g. File has open<P: AsRef<Path>>(path: P), and there's a long list of std types that implement AsRef<Path.

The question mark operator (for propagating errors) uses Into<___> for whatever your returned error type is. So even if you're not using anyhow or thiserror, implementing Into (or more likely, From) for your error types will allow you to use a variety of error types with ?.

I do agree that buf.put_u64(self.lsn.0) isn't very friendly.

However, I think that all of this manual byte-pushing code should be replaced with serde anyway. As long as we #[derive(Serialize, Deserialize)] on the Lsn type then it will work exactly the same as u64 with no extra code at all.

It's not a big deal, we can live with it. I was just surprised it didn't "just work".

However, I think that all of this manual byte-pushing code should be replaced with serde anyway. As long as we #[derive(Serialize, Deserialize)] on the Lsn type then it will work exactly the same as u64 with no extra code at all.

That handles this particular case, but I would imagine that this arises in many other places, too. One example off the top of my head: what if you want to store an Lsn in an AtomicU64?

I think this works, too:

buf.put_u64(self.lsn.into());

I haven't found a good crate that allows Atomic<T> where it's implemented as e.g. AtomicU64 if the size is 64-bits. I made one myself, but never quite finished it. Handling the case where T is not Copy gets quite nasty, but we don't need that here, so I could probably dig out the simple version (that requires T = Copy) if we want it.

Or we could just do a quick AtomicLsn wrapper, and wait to see how often we need to cut/paste that code...

Another thought: the kind of argument mismatch that put_u64 hits gets less likely as we get to more complex types (that aren't Copy), because they're likely to be passed by reference.

If we have a bunch of functions that take by-reference arguments, e.g.

fn handle_mytype(mt: &MyType);

Then you'll note that the mt argument can be satisfied by any container that Derefs to a MyType. For example, Arc<MyType> or MutexGuard<'_, MyType> both impl Deref with Target=Mytype, so you can write:

handle_mytype(&mt);

... and that code will continue to work whether mt is a MyType or Arc or MutexGuard (or some other container that we build ourselves).

So your instinct that wrapper types should "just work" is still partially true, at least for the bigger data structures.

Will it be better to add pack/unpack methods to Lsn?
So that it is possible to write:

self.tag.pack(buf); self.lsn.pack(buf);

Will it be better to add pack/unpack methods to Lsn?

Yes, this can be done automatically with #[derive(serde::Serialize, serde::Deserialize)] and then we can use any serde-compatible library we like. The bincode crate seems to implement the raw serialization we need in most places (including endian conversions).

I would need a bit more time to look at the code, but the end result is that we should be able to write things like my_data_structure.serialize(&mut buffer) without ever needing to type put_u64 again.

There is one case where I'm not certain about, and that's when the network packet includes little-endian data that can be copied directly into a Rust struct with no validation. I don't think it's possible to generate that code with serde right now (see also #6), but if the performance is important to us, I think we can build tools to do that.

ericseppanen · 2021-04-24T16:36:28Z

Well, I can not agree that it is "zero-cost wrapper".
Please look at this simple program:

<snip>

It is about 1.5 time differences for non-optimized code and 1000 times for optimized.
I am not sure that optimizer doesn't eliminate calls at all in last case. And speed is still very high. I do not think that we will notice any difference if we use this Lsn representation.

But still it is not possible to say that cost is zero:)

Ah, but you cheated :)

The Lsn version is using checked_add to protect against overflow. If you change the u64 version to also use checked_add you will see identical performance.

It's true that you will see the extra layer of abstraction in a debug build, but it release mode Rust is very good at seeing right through abstractions like this.

Zero-cost abstractions are one of Rust's primary design goals. Rust optimization is a bit more powerful than C or C++, because the type system carries a lot more information. And the Rust optimizer will never break your code the way that often happens to C/C++ programs that didn't know they were relying on some hidden undefined behavior.

ericseppanen · 2021-04-24T16:38:31Z

And one of these days, rust optimization will get even better! There has been a long fight against LLVM bugs, but in March the nightly compiler enabled the "mutable noalias" optimization, which means that code like this can be optimized (because mutable references are unique, these two references are known to not point to the same location):

fn adds(a: &mut i32, b: &mut i32) {
    *a += *b;
    *a += *b;
}

This optimization has been enabled and reverted a few times (because of bugs in LLVM), so we might not see it in a stable release yet. But it will happen eventually.

knizhnik · 2021-04-24T19:30:02Z

Ah, but you cheated :)

Sorry, I didn't expect that just one extra instruction (check overflow bit) may such dramatically affect performance.

ericseppanen · 2021-04-24T22:11:42Z

Updates:

rebased the branch
replaced sub with sub_checked (53fe4b4)
added an AtomicLsn type, to reduce some of the manual conversions (5dcff27)

pageserver/src/page_cache.rs

pageserver/src/waldecoder.rs

ericseppanen · 2021-04-24T22:28:34Z

pageserver/src/waldecoder.rs

                    if is_xlog_switch_record(&recordbuf) {
-                        trace!(
-                            "saw xlog switch record at {:X}/{:X}",
-                            (self.lsn >> 32),
-                            self.lsn & 0xffffffff
-                        );
-                        self.padlen = (WAL_SEGMENT_SIZE - (self.lsn % WAL_SEGMENT_SIZE)) as u32;
+                        trace!("saw xlog switch record at {}", self.lsn);
+                        self.padlen = (WAL_SEGMENT_SIZE - (self.lsn.0 % WAL_SEGMENT_SIZE)) as u32;
                    }

-                    if self.lsn % 8 != 0 {
-                        self.padlen = 8 - (self.lsn % 8) as u32;
+                    // FIXME: what does this code do?
+                    if self.lsn.0 % 8 != 0 {
+                        self.padlen = 8 - (self.lsn.0 % 8) as u32;
                    }


Could someone educate me about what's happening here? It looks like:

If "switch record" (what's that?), then compute padding to the end of the segment.

If the current LSN isn't a multiple of 8, then compute padding to the next 8-byte boundary, overwriting the padding computed above (?!)

How much of this should be captured into Lsn member functions? We could add something like Lsn::pad_to(chunk_size: u64) that could be used for both.

WAL in postgres is splitted into segments. Default size is 16Mb, but unlike XLOGBCLSZ is can be easily changed.
Switching XLOG means start writing to the next segment file.
Usually XLOG seitch record is not needed: WAL segments is filled till the end and then we swutch to next segment. But in some cases we may need to force switching to the new segment although there is still space in the current segment. It may be needed for WAL archiving.

WAL records are aligned on 8 byte boundary. Padding is used to enforce such alignment.
Actually the code fragment can be simplified as:

self.padlen = (-self.lsn.0 & 7) as u32;

If you want to see an xlog switch in action, you can force it with select pg_switch_wal();. Can become handy in testing. See https://www.postgresql.org/docs/current/functions-admin.html.

Is it a bug that the code overwrites the "switch record" padding with the 8-byte wal record padding?

zenith_utils/src/lsn.rs

This type is a zero-cost wrapper for a u64, meant to help code communicate with precision what that value means. It implements Display and Debug. Display "{}" will format as "1234ABCD:5678CDEF" while Debug will format as Lsn{1234567890}.

SeqWait can use any type that is Ord + Debug + Copy. Debug is not strictly necessary, but allows us to keep the panic message if a caller wants the sequence number to go backwards.

Use the `Lsn` type everywhere that I can find u64 being used to represent an LSN.

There is only one place doing subtraction, and it had a manually implemented check.

AtomicLsn is a wrapper around AtomicU64 that has load() and store() members that are cheap (on x86, anyway) and can be safely used in any context. This commit uses AtomicLsn in the page cache, and fixes up some downstream code that manually implemented LSN formatting. There's also a bugfix to the logging in wait_lsn, which prints the wrong lsn value.

There's no clear way to sum LSNs across timelines, so just remove them for now.

Replace open-coded math with member fns.

ericseppanen requested review from knizhnik, hlinnaka, kelvich and lubennikovaav April 23, 2021 21:58

knizhnik reviewed Apr 24, 2021

View reviewed changes

hlinnaka approved these changes Apr 24, 2021

View reviewed changes

ericseppanen commented Apr 24, 2021

View reviewed changes

knizhnik approved these changes Apr 25, 2021

View reviewed changes

ericseppanen added 8 commits April 25, 2021 19:22

add Lsn type

70feb7e

This type is a zero-cost wrapper for a u64, meant to help code communicate with precision what that value means. It implements Display and Debug. Display "{}" will format as "1234ABCD:5678CDEF" while Debug will format as Lsn{1234567890}.

make seqwait generic

e94e12a

SeqWait can use any type that is Ord + Debug + Copy. Debug is not strictly necessary, but allows us to keep the panic message if a caller wants the sequence number to go backwards.

apply Lsn type everywhere

78d11ae

Use the `Lsn` type everywhere that I can find u64 being used to represent an LSN.

remove Lsn::sub in favor of sub_checked

2ae2627

There is only one place doing subtraction, and it had a manually implemented check.

Drop LSNs from PageCacheStats

7459421

There's no clear way to sum LSNs across timelines, so just remove them for now.

add Lsn::block_offset, remaining_in_block, calc_padding

b4ec229

Replace open-coded math with member fns.

add test cases for Lsn math and AtomicLsn

dc9e473

ericseppanen merged commit 96b6f35 into neondatabase:main Apr 26, 2021

ericseppanen deleted the lsn_type branch April 26, 2021 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Lsn type #64

add Lsn type #64

ericseppanen commented Apr 23, 2021 •

edited

Loading

knizhnik left a comment

hlinnaka left a comment

hlinnaka Apr 24, 2021

ericseppanen Apr 24, 2021

hlinnaka Apr 24, 2021

ericseppanen Apr 24, 2021

ericseppanen Apr 24, 2021

hlinnaka Apr 24, 2021

ericseppanen Apr 24, 2021

ericseppanen Apr 24, 2021

knizhnik Apr 24, 2021

ericseppanen Apr 24, 2021

ericseppanen commented Apr 24, 2021

ericseppanen commented Apr 24, 2021

knizhnik commented Apr 24, 2021 •

edited

Loading

ericseppanen commented Apr 24, 2021 •

edited

Loading

ericseppanen Apr 24, 2021

knizhnik Apr 25, 2021

hlinnaka Apr 25, 2021 •

edited

Loading

ericseppanen Apr 25, 2021

add Lsn type #64

add Lsn type #64

Conversation

ericseppanen commented Apr 23, 2021 • edited Loading

knizhnik left a comment

Choose a reason for hiding this comment

hlinnaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericseppanen commented Apr 24, 2021

ericseppanen commented Apr 24, 2021

knizhnik commented Apr 24, 2021 • edited Loading

ericseppanen commented Apr 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlinnaka Apr 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericseppanen commented Apr 23, 2021 •

edited

Loading

knizhnik commented Apr 24, 2021 •

edited

Loading

ericseppanen commented Apr 24, 2021 •

edited

Loading

hlinnaka Apr 25, 2021 •

edited

Loading